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Abstract 

Summary: 

CytoSaddleSum provides Cytoscape users with access to the functionality of SaddleSum, a functional enrichment tool 
based on sum-of- weight scores. It operates by querying SaddleSum locally (using the standalone version) or remotely 
(through an HTTP request to a web server). The functional enrichment results are shown as a term relationship 
network, where nodes represent terms and edges show term relationships. Furthermore, query results are written as 
Cytoscape attributes allowing easy saving, retrieval and integration into network-based data analysis workflows. 

Availability: 

|www.ncbi.nlm .nih.gov/CBBresearch7Yu/d ownloads| 
The source code is placed in Public Domain. 

Contact: 

|yyu@ncbi.nlm.nih.gov| 



*to whom correspondence should be addressed 



1 Introduction 



CytoSaddleSum is a Cytoscape ( jSmoot et al.\ |2011| l plugin to access the functionality of SaddleSum, an enrichment 
analysis tool based on sum-of-weights-score ( Stojmirovic and Yu 2010| l. Unlike most other enrichment tools, Saddle- 
Sum does not require users to directly select significant genes or perform extensive simulations to compute statistics. 
Instead, it uses weights derived from measurements, such as log-expression ratios, to produce a score for each database 
term. It then estimates, depending on the number of genes involved, the P-value for that score by using the saddle- 
point approximation ( [Lugannani and Rice]|1980| to the empirical distribution function derived from all weights. This 
approach was shown ( Stojmirovic and Yu 2010 1 to yield accurate P-values and internally consistent retrievals. 

As a popular and flexible platform for visualization, integration and analysis of network data, Cytoscape allows 
gene expression data import and hosts numerous plugins for functional enrichment analysis. However, none of these 
plugins are based on the 'gene set analysis approach' that takes into account gene weights. Therefore, to fill this 
gap, we have developed CytoSaddleSum, a Cytoscape interface to SaddleSum. To enable several desirable features of 
CytoSaddleSum, however, we had to significantly extend the original SaddleSum code (see descriptions below). 



2 Implementation 



While CytoSaddleSum is implemented in Java using Cytoscape API, it functions by ranning either locally or re- 
motely a separate instance of SaddleSum, written in C. In either mode, CytoSaddleSum takes the user input through 
a graphical user interface, validates it, and passes a query to SaddleSum. Upon receiving the entire query results, 
CytoSaddleSum stores them as the node and network attributes of the newly-created term relationship graph. Con- 
sequently, the query output can be edited or manipulated within Cytoscape. Furthermore, saving term graph through 
Cytoscape also preserves the results for later use. 

The most important extension to SaddleSum involved construction of extended term databases (ETDs). Each ETD 



contains the mappings of genes to Gene Ontology (Gene Ontology Consortium 20101 terms and KEGG (Kanehisa 
et al. 2008 1 pathways, as well as an abbreviated version of the NCBI Gene (Maglott et al. 2011[ l database for all 
genes mapped to terms. Thanks to the latter, when using an ETD, SaddleSum is able to interpret the provided gene 
labels as NCBI Gene IDs, as gene symbols and as gene aliases. Each ETD also contains relations among terms that 
are used by SaddleSum for term graph construction. 



3 Usage 

CytoSaddleSum operates on the currently selected Cytoscape network whose nodes represent genes or gene products. 
The queries are submitted through the query form embedded as a tab into the Cytoscape Control Panel, on the left of 
the screen. The selected network must contain at least one node mapped to a floating-point Cytoscape attribute, which 
would provide node weights. CytoSaddleSum considers only the selected nodes within the network. The user can 
select the weight attribute through a dropdown box on the query form. Any selected node without specified weight is 
assumed to have weight 0. The user-settable cannonicalName attribute, automatically created by Cytoscape for each 
network node, serves as the gene label. 
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Figure 1 : CytoSaddleSum user interface consists of the query form (left), the results panel (right) and the term rela- 
tionship network (center), which here partially covers the original network. The results stored as attributes of the term 
network can be edited through Cytoscape Data Panel. 



After selecting the network and the nodes within it, the user needs to select a term database and set the statistical 
and weight processing parameters. The latter enable users to transform the supplied weights within SaddleSum. This 
includes changing the sign of the weights, as well as applying a cutoff, by weight or by rank. All weights below 
the cutoff are set to 0. The statistical parameters are E-value cutoff, minimum term size, effective database size and 
statistical method. We define the effective database size as the number of terms in the term database that map to at 
least k genes among the selected nodes, where k is the minimum term size. Apart from the default 'Lugannani-Rice' 
statistics, it is also possible to select 'One-sided Fisher's Exact test' statistics, which are based on the hypergeometric 
distribution. In that case, the user must select a cutoff under the weight processing parameters. 

To run local queries, a user needs the command-line version of SaddleSum and the term databases, both available 
for download from our website, and install them on the same machine that runs Cytoscape. The advantages of running 
local queries include speed, independence of Internet connection, and support of queries to custom databases in the 



GMT file format used by the GSEA tool ( |Subramanian et al. 2005) . Furthermore, the standalone program can be 
used outside of Cytoscape for large sets of queries. On the other hand, running remote queries require no installation 
of additional software, since queries are passed to the SaddleSum server over an HTTP connection. The disadvantage 
of running remote queries is that it can take much longer to run and that the choice of term databases is restricted to 
ETDs available only for some model organisms. 

CytoSaddleSum also displays warning or error messages reported by SaddleSum. For example, when a provided 
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gene label is ambiguous, depending on whether the ambiguity could be resolved, CytoSaddleSum will relay a warning 
or an error message reported by SaddleSum. CytoSaddleSum presents query results as a term relationship network 
(Fig.[T]), consisting of significant terms or their ancestors linked by hierarchical relations available in the term database. 
The statistical significance of each term is indicated by the color of its corresponding node. To facilitate browsing of 
the results, CytoSaddleSum generates a set of summary tables, which contain the lists of significant terms and various 
details about the query. These summary tables are embedded into Cytoscape Results Panel, on the right of the screen. 
Clicking on a significant term in a summary table will select that term in the term relationship network and select 
all nodes mapping to it in the original network. The results can be exported as text or tab-delimited files and can 
be restored from tab-deUmited files through the Export and Import menus of Cytoscape. Detailed instructions and 
explanations can be found in SaddleSum manual available from our website. 
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